27 research outputs found
MetaGCD: Learning to Continually Learn in Generalized Category Discovery
In this paper, we consider a real-world scenario where a model that is
trained on pre-defined classes continually encounters unlabeled data that
contains both known and novel classes. The goal is to continually discover
novel classes while maintaining the performance in known classes. We name the
setting Continual Generalized Category Discovery (C-GCD). Existing methods for
novel class discovery cannot directly handle the C-GCD setting due to some
unrealistic assumptions, such as the unlabeled data only containing novel
classes. Furthermore, they fail to discover novel classes in a continual
fashion. In this work, we lift all these assumptions and propose an approach,
called MetaGCD, to learn how to incrementally discover with less forgetting.
Our proposed method uses a meta-learning framework and leverages the offline
labeled data to simulate the testing incremental learning process. A
meta-objective is defined to revolve around two conflicting learning objectives
to achieve novel class discovery without forgetting. Furthermore, a soft
neighborhood-based contrastive network is proposed to discriminate uncorrelated
images while attracting correlated images. We build strong baselines and
conduct extensive experiments on three widely used benchmarks to demonstrate
the superiority of our method.Comment: This paper has been accepted by ICCV202
Domain Adaptive Attention Model for Unsupervised Cross-Domain Person Re-Identification
Person re-identification (Re-ID) across multiple datasets is a challenging
yet important task due to the possibly large distinctions between different
datasets and the lack of training samples in practical applications. This work
proposes a novel unsupervised domain adaption framework which transfers
discriminative representations from the labeled source domain (dataset) to the
unlabeled target domain (dataset). We propose to formulate the domain adaption
task as an one-class classification problem with a novel domain similarity
loss. Given the feature map of any image from a backbone network, a novel
domain adaptive attention model (DAAM) first automatically learns to separate
the feature map of an image to a domain-shared feature (DSH) map and a
domain-specific feature (DSP) map simultaneously. Specially, the residual
attention mechanism is designed to model DSP feature map for avoiding negative
transfer. Then, a DSH branch and a DSP branch are introduced to learn DSH and
DSP feature maps respectively. To reduce domain divergence caused by that the
source and target datasets are collected from different environments, we force
to project the DSH feature maps from different domains to a new nominal domain,
and a novel domain similarity loss is proposed based on one-class
classification. In addition, a novel unsupervised person Re-ID loss is proposed
to take full use of unlabeled target data. Extensive experiments on the
Market-1501 and DukeMTMC-reID benchmarks demonstrate state-of-the-art
performance of the proposed method. Code will be released to facilitate further
studies on the cross-domain person re-identification task
CMTR: Cross-modality Transformer for Visible-infrared Person Re-identification
Visible-infrared cross-modality person re-identification is a challenging
ReID task, which aims to retrieve and match the same identity's images between
the heterogeneous visible and infrared modalities. Thus, the core of this task
is to bridge the huge gap between these two modalities. The existing
convolutional neural network-based methods mainly face the problem of
insufficient perception of modalities' information, and can not learn good
discriminative modality-invariant embeddings for identities, which limits their
performance. To solve these problems, we propose a cross-modality
transformer-based method (CMTR) for the visible-infrared person
re-identification task, which can explicitly mine the information of each
modality and generate better discriminative features based on it. Specifically,
to capture modalities' characteristics, we design the novel modality
embeddings, which are fused with token embeddings to encode modalities'
information. Furthermore, to enhance representation of modality embeddings and
adjust matching embeddings' distribution, we propose a modality-aware
enhancement loss based on the learned modalities' information, reducing
intra-class distance and enlarging inter-class distance. To our knowledge, this
is the first work of applying transformer network to the cross-modality
re-identification task. We implement extensive experiments on the public
SYSU-MM01 and RegDB datasets, and our proposed CMTR model's performance
significantly surpasses existing outstanding CNN-based methods.Comment: 11 pages, 7 figures, 7 table
Subgraph and object context‐masked network for scene graph generation
Scene graph generation is to recognise objects and their semantic relationships in an image and can help computers understand visual scene. To improve relationship prediction, geometry information is essential and usually incorporated into relationship features. Existing methods use coordinates of objects to encode their spatial layout. However, in this way, they neglect the context of objects. In this study, to take full use of spatial knowledge efficiently, the authors propose a novel subgraph and object context‐masked network (SOCNet) consisting of spatial mask relation inference (SMRI) and hierarchical message passing (HMP) modules to address the scene graph generation task. In particular, to take advantage of spatial knowledge, SMRI masks partial context of object features depending on their spatial layout of objects and corresponding subgraph to facilitate their relationship recognition. To refine the features of objects and subgraphs, they also propose HMP that passes highly correlated messages from both microcosmic and macroscopic aspects through a triple‐path structure including subgraph–subgraph, object–object, and subgraph–object paths. Finally, statistical co‐occurrence probability is used to regularise relationship prediction. SOCNet integrates HMP and SMRI into a unified network, and comprehensive experiments on visual relationship detection and visual genome datasets indicate that SOCNet outperforms several state‐of‐the‐art methods on two common tasks